reaction diagram
MolMole: Molecule Mining from Scientific Literature
Research, LG AI, Chun, Sehyun, Kim, Jiye, Jo, Ahra, Jo, Yeonsik, Oh, Seungyul, Lee, Seungjun, Ryoo, Kwangrok, Lee, Jongmin, Kim, Seung Hwan, Kang, Byung Jun, Lee, Soonyoung, Park, Jun Ha, Moon, Chanwoo, Ham, Jiwon, Lee, Haein, Han, Heejae, Byun, Jaeseung, Do, Soojong, Ha, Minju, Kim, Dongyun, Bae, Kyunghoon, Lim, Woohyung, Lee, Edward Hwayoung, Park, Yongmin, Yu, Jeongsang, Jo, Gerrard Jeongwon, Hong, Yeonjung, Yoo, Kyungjae, Han, Sehui, Lee, Jaewan, Park, Changyoung, Jeon, Kijeong, Yi, Sihyuk
The extraction of molecular structures and reaction data from scientific documents is challenging due to their varied, unstructured chemical formats and complex document layouts. To address this, we introduce MolMole, a vision-based deep learning framework that unifies molecule detection, reaction diagram parsing, and optical chemical structure recognition (OCSR) into a single pipeline for automating the extraction of chemical data directly from page-level documents. Recognizing the lack of a standard page-level benchmark and evaluation metric, we also present a testset of 550 pages annotated with molecule bounding boxes, reaction labels, and MOLfiles, along with a novel evaluation metric. Experimental results demonstrate that MolMole outperforms existing toolkits on both our benchmark and public datasets. The benchmark testset will be publicly available, and the MolMole toolkit will be accessible soon through an interactive demo on the LG AI Research website. For commercial inquiries, please contact us at contact ddu@lgresearch.ai.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Switzerland > Vaud > Lausanne (0.04)
RxnScribe: A Sequence Generation Model for Reaction Diagram Parsing
Qian, Yujie, Guo, Jiang, Tu, Zhengkai, Coley, Connor W., Barzilay, Regina
Reaction diagram parsing is the task of extracting reaction schemes from a diagram in the chemistry literature. The reaction diagrams can be arbitrarily complex, thus robustly parsing them into structured data is an open challenge. In this paper, we present RxnScribe, a machine learning model for parsing reaction diagrams of varying styles. We formulate this structured prediction task with a sequence generation approach, which condenses the traditional pipeline into an end-to-end model. We train RxnScribe on a dataset of 1,378 diagrams and evaluate it with cross validation, achieving an 80.0% soft match F1 score, with significant improvements over previous models. Our code and data are publicly available at https://github.com/thomas0809/RxnScribe.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Aragón (0.04)
- (4 more...)